Translingual Information Management by Natural Language Processing

نویسندگان

  • Akitoshi Okumura
  • Hiroshi Maruyama
  • Masayuki Numao
  • Yoshiaki Shirai
چکیده

Preface Translingual information management that can choose appropriate information and obtain useful knowledge from the ood of global information is being increasingly demanded. Among the related technologies, natural language processing is one of the most promising for meeting information needs because natural language processing can deal with documents that have essential roles in information. This research is aimed at developing a practical system for managing translingual information. The system is based on technologies for natural language processing. Translingual information management has a fundamental cycle of information needs and information search. In order to enhance the search quality and to accelerate the cycle, this research focuses on three core technologies: coordinate structure analysis, cross-language information retrieval, and information navigation. First, we propose a model for analyzing coordinate structure. This model provides top-down scope information on the correct syntactic structure by taking advantage of the symmetric patterns of parallelism. The analysis of coordinate structure creates a bottleneck when dealing with documents because most long sentences contain a coordinate structure. The model results in a high-quality search because it enables accurate analysis. Second, we propose the GDMAX method, which is a method for translating query terms for use in cross-language information retrieval (CLIR). CLIR is a key component in translin-gual information management. This method produces high-quality search results by choosing appropriate translation terms for CLIR. Finally, in order to make translingual information management ecient, we propose a method for information navigation that classies documents and enables a user to navigate by using 5W1H (who, when, where, what, why, how, and predicate) information. i ii Acknowledgments First and foremost, I would like to thank my supervisors, Prof. Hozumi Tanaka (TITECH, Japan) and Prof. Takenobu Tokunaga (TITECH, Japan) for their guidance, support and encouragement throughout the years of my Ph.D. studentship. I would also like to thank other members of my thesis committee, Prof. was conducted jointly with him, and without his advice and proposals I would not have been able to achieve the results reported here. I also extend thanks to Mr. Shin-ichiro Kamei for his invaluable advice and discussions from viewpoints of linguistic engineering. I am also grateful to Mr. Shin-ichi Doi and Mr. Kiyoshi Yamabana for their useful discussions and suggestions about CLIR and the DMAX method. I am also deeply appreciative of the valuable support of Mr. Eduard Hovy gave me global viewpoints on natural language iii iv processing and Kevin …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Translingual Information Retrieval: Learning from Bilingual Corpora

Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more diierent languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation and statistical-IR appr...

متن کامل

Translingual Information Access

We present an attempt at a coherent vision of an end-to-end translingual information retrieval system. We begin by presenting a sample of the broad range of possibilities, and the results of some initial work comparing the different approaches. We then present an overall workstation architecture, followed by two possible approaches to the actual translingual IR stage presented in detail. Rankin...

متن کامل

Interlingua-Based Broad-Coverage Korean-to-English Translation in CCLINC

At MIT Lincoln Laboratory, we have been developing a Koreanto-English machine translation system CCLINC (Common Coalition Language System at Lincoln Laboratory). The CCLINC Korean-to-English translation system consists of two core modules, language understanding and generation modules mediated by a language neutral meaning representation called a semantic frame. The key features of the system i...

متن کامل

Translingual Information Retrieval: Learning from Bilingual Corpora (ai Journal Special Issue: Best of Ijcai-97)

Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more diierent languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation and statistical-IR appr...

متن کامل

Translingual Mining from Text Data

Like full-text translation, cross-language information retrieval (CLIR) is a task that requires some form of knowledge transfer across languages. Although robust translation resources are critical for constructing high quality translation tools, manually constructed resources are limited both in their coverage and in their adaptability to a wide range of applications. Automatic mining of transl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007